π AI Battle: Deepseek vs ChatGPT - A Comprehensive AnalysisΒΆ
π Which AI Performs Better? Letβs Uncover the Truth!ΒΆ
π The Story Behind This AnalysisΒΆ
π€ "In the fast-paced world of AI, two titans battle for dominance: Deepseek and ChatGPT. Each promises unparalleled intelligence, lightning-fast responses, and human-like understanding. But the real question remains...
Which AI truly delivers the best experience?"ΒΆ
Imagine a world where AI assistants handle billions of queries daily. Some users rave about their accuracy, while others complain about hallucinations and slow responses. Who should we trust?
This project dives deep into real-world user interactions to uncover the strengths and weaknesses of both AI models.
π What This Notebook CoversΒΆ
π Exploratory Data Analysis (EDA) β Uncover hidden trends and insights π
π Interactive Visualizations β Compare AI performance dynamically π
π User-Based Filtering & Search β Analyze engagement and preferences π―
π Outlier Detection & Handling β Clean and refine the data for accuracy π οΈ
π Session Tracking β Understand long-term user behavior and AI adoption π΅οΈ
π By the end of this notebook, you'll have a data-driven understanding of how Deepseek and ChatGPT compare across multiple dimensions!
π¬ Let the AI Battle Begin! π₯ΒΆ
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
import seaborn as sns
import matplotlib.pyplot as plt
df=pd.read_csv('deepseek_vs_chatgpt.csv')
df
| Date | Month_Num | Weekday | AI_Platform | AI_Model_Version | Active_Users | New_Users | Churned_Users | Daily_Churn_Rate | Retention_Rate | ... | Session_Duration_sec | Device_Type | Language | Response_Accuracy | Response_Speed_sec | Response_Time_Category | Correction_Needed | User_Return_Frequency | Customer_Support_Interactions | Region | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2024-09-21 | 9 | Saturday | ChatGPT | GPT-4-turbo | 500000 | 25000 | 25000 | 0.05 | 0.95 | ... | 40 | Mobile | es | 0.7842 | 3.30 | Standard | 0 | 6 | 2 | Antarctica (the territory South of 60 deg S) |
| 1 | 2024-09-21 | 9 | Saturday | ChatGPT | GPT-4-turbo | 500000 | 25000 | 25000 | 0.05 | 0.95 | ... | 24 | Laptop/Desktop | zh | 0.8194 | 3.28 | Standard | 1 | 2 | 2 | Ukraine |
| 2 | 2024-09-21 | 9 | Saturday | ChatGPT | GPT-4-turbo | 500000 | 25000 | 25000 | 0.05 | 0.95 | ... | 34 | Mobile | en | 0.8090 | 3.07 | Standard | 0 | 2 | 0 | Grenada |
| 3 | 2024-09-21 | 9 | Saturday | ChatGPT | GPT-4-turbo | 500000 | 25000 | 25000 | 0.05 | 0.95 | ... | 18 | Mobile | fr | 0.8233 | 3.06 | Standard | 0 | 9 | 0 | Guyana |
| 4 | 2024-05-16 | 5 | Thursday | DeepSeek | DeepSeek-Chat 1.5 | 1700000 | 170000 | 34000 | 0.02 | 0.95 | ... | 10 | Mobile | de | 0.9366 | 1.48 | Fast | 0 | 9 | 3 | India |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | 2024-05-17 | 5 | Friday | DeepSeek | DeepSeek-Chat 1.5 | 1700000 | 170000 | 34000 | 0.02 | 0.95 | ... | 34 | Laptop/Desktop | zh | 0.9791 | 0.60 | Instant | 0 | 7 | 2 | Seychelles |
| 9996 | 2024-05-17 | 5 | Friday | DeepSeek | DeepSeek-Chat 1.5 | 1700000 | 170000 | 34000 | 0.02 | 0.95 | ... | 19 | Laptop/Desktop | en | 0.9132 | 0.83 | Instant | 0 | 5 | 0 | Christmas Island |
| 9997 | 2024-05-17 | 5 | Friday | DeepSeek | DeepSeek-Chat 1.5 | 1700000 | 170000 | 34000 | 0.02 | 0.95 | ... | 29 | Laptop/Desktop | de | 0.9516 | 0.94 | Instant | 0 | 10 | 2 | Ethiopia |
| 9998 | 2024-05-17 | 5 | Friday | DeepSeek | DeepSeek-Chat 1.5 | 1700000 | 170000 | 34000 | 0.02 | 0.95 | ... | 21 | Mobile | de | 0.9359 | 0.83 | Instant | 0 | 5 | 3 | Puerto Rico |
| 9999 | 2024-05-17 | 5 | Friday | DeepSeek | DeepSeek-Chat 1.5 | 1700000 | 170000 | 34000 | 0.02 | 0.95 | ... | 58 | Mobile | fr | 0.9399 | 0.76 | Instant | 1 | 7 | 1 | Kyrgyz Republic |
10000 rows Γ 28 columns
df.columns
Index(['Date', 'Month_Num', 'Weekday', 'AI_Platform', 'AI_Model_Version',
'Active_Users', 'New_Users', 'Churned_Users', 'Daily_Churn_Rate',
'Retention_Rate', 'User_ID', 'Query_Type', 'Input_Text',
'Input_Text_Length', 'Response_Tokens', 'Topic_Category', 'User_Rating',
'User_Experience_Score', 'Session_Duration_sec', 'Device_Type',
'Language', 'Response_Accuracy', 'Response_Speed_sec',
'Response_Time_Category', 'Correction_Needed', 'User_Return_Frequency',
'Customer_Support_Interactions', 'Region'],
dtype='object')
df.describe()
| Month_Num | Active_Users | New_Users | Churned_Users | Daily_Churn_Rate | Retention_Rate | Input_Text_Length | Response_Tokens | User_Rating | User_Experience_Score | Session_Duration_sec | Response_Accuracy | Response_Speed_sec | Correction_Needed | User_Return_Frequency | Customer_Support_Interactions | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 10000.000000 | 1.000000e+04 | 10000.000000 | 10000.000000 | 10000.000000 | 1.000000e+04 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 9621.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 |
| mean | 7.128900 | 1.196255e+06 | 100508.750000 | 35395.150000 | 0.035228 | 9.500000e-01 | 6.260700 | 274.765100 | 4.394700 | 1.626706 | 28.533700 | 0.850287 | 2.356651 | 0.144600 | 5.530600 | 1.476800 |
| std | 3.559712 | 7.444465e+05 | 85584.077151 | 14849.189585 | 0.014999 | 1.054765e-14 | 1.188561 | 130.077225 | 0.734551 | 0.491296 | 14.090348 | 0.072755 | 1.303743 | 0.351715 | 2.867906 | 1.120887 |
| min | 1.000000 | 2.000000e+05 | 12500.000000 | 4000.000000 | 0.020000 | 9.500000e-01 | 4.000000 | 50.000000 | 3.000000 | 0.480000 | 5.000000 | 0.654200 | 0.330000 | 0.000000 | 1.000000 | 0.000000 |
| 25% | 4.000000 | 6.500000e+05 | 35000.000000 | 25000.000000 | 0.020000 | 9.500000e-01 | 6.000000 | 162.000000 | 4.000000 | 1.230000 | 17.000000 | 0.801800 | 1.250000 | 0.000000 | 3.000000 | 0.000000 |
| 50% | 8.000000 | 9.500000e+05 | 52500.000000 | 35000.000000 | 0.050000 | 9.500000e-01 | 7.000000 | 276.000000 | 5.000000 | 1.710000 | 27.000000 | 0.862200 | 2.070000 | 0.000000 | 6.000000 | 1.000000 |
| 75% | 10.000000 | 1.700000e+06 | 170000.000000 | 49000.000000 | 0.050000 | 9.500000e-01 | 7.000000 | 386.250000 | 5.000000 | 2.070000 | 38.000000 | 0.905000 | 3.450000 | 0.000000 | 8.000000 | 2.000000 |
| max | 12.000000 | 3.050000e+06 | 305000.000000 | 61000.000000 | 0.050000 | 9.500000e-01 | 8.000000 | 500.000000 | 5.000000 | 2.280000 | 60.000000 | 0.997200 | 5.190000 | 1.000000 | 10.000000 | 3.000000 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 28 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 10000 non-null object 1 Month_Num 10000 non-null int64 2 Weekday 10000 non-null object 3 AI_Platform 10000 non-null object 4 AI_Model_Version 10000 non-null object 5 Active_Users 10000 non-null int64 6 New_Users 10000 non-null int64 7 Churned_Users 10000 non-null int64 8 Daily_Churn_Rate 10000 non-null float64 9 Retention_Rate 10000 non-null float64 10 User_ID 10000 non-null object 11 Query_Type 10000 non-null object 12 Input_Text 10000 non-null object 13 Input_Text_Length 10000 non-null int64 14 Response_Tokens 10000 non-null int64 15 Topic_Category 10000 non-null object 16 User_Rating 10000 non-null int64 17 User_Experience_Score 10000 non-null float64 18 Session_Duration_sec 10000 non-null int64 19 Device_Type 10000 non-null object 20 Language 10000 non-null object 21 Response_Accuracy 9621 non-null float64 22 Response_Speed_sec 10000 non-null float64 23 Response_Time_Category 10000 non-null object 24 Correction_Needed 10000 non-null int64 25 User_Return_Frequency 10000 non-null int64 26 Customer_Support_Interactions 10000 non-null int64 27 Region 10000 non-null object dtypes: float64(5), int64(11), object(12) memory usage: 2.1+ MB
df.tail(5)
| Date | Month_Num | Weekday | AI_Platform | AI_Model_Version | Active_Users | New_Users | Churned_Users | Daily_Churn_Rate | Retention_Rate | ... | Session_Duration_sec | Device_Type | Language | Response_Accuracy | Response_Speed_sec | Response_Time_Category | Correction_Needed | User_Return_Frequency | Customer_Support_Interactions | Region | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9995 | 2024-05-17 | 5 | Friday | DeepSeek | DeepSeek-Chat 1.5 | 1700000 | 170000 | 34000 | 0.02 | 0.95 | ... | 34 | Laptop/Desktop | zh | 0.9791 | 0.60 | Instant | 0 | 7 | 2 | Seychelles |
| 9996 | 2024-05-17 | 5 | Friday | DeepSeek | DeepSeek-Chat 1.5 | 1700000 | 170000 | 34000 | 0.02 | 0.95 | ... | 19 | Laptop/Desktop | en | 0.9132 | 0.83 | Instant | 0 | 5 | 0 | Christmas Island |
| 9997 | 2024-05-17 | 5 | Friday | DeepSeek | DeepSeek-Chat 1.5 | 1700000 | 170000 | 34000 | 0.02 | 0.95 | ... | 29 | Laptop/Desktop | de | 0.9516 | 0.94 | Instant | 0 | 10 | 2 | Ethiopia |
| 9998 | 2024-05-17 | 5 | Friday | DeepSeek | DeepSeek-Chat 1.5 | 1700000 | 170000 | 34000 | 0.02 | 0.95 | ... | 21 | Mobile | de | 0.9359 | 0.83 | Instant | 0 | 5 | 3 | Puerto Rico |
| 9999 | 2024-05-17 | 5 | Friday | DeepSeek | DeepSeek-Chat 1.5 | 1700000 | 170000 | 34000 | 0.02 | 0.95 | ... | 58 | Mobile | fr | 0.9399 | 0.76 | Instant | 1 | 7 | 1 | Kyrgyz Republic |
5 rows Γ 28 columns
df.isnull().sum()
Date 0 Month_Num 0 Weekday 0 AI_Platform 0 AI_Model_Version 0 Active_Users 0 New_Users 0 Churned_Users 0 Daily_Churn_Rate 0 Retention_Rate 0 User_ID 0 Query_Type 0 Input_Text 0 Input_Text_Length 0 Response_Tokens 0 Topic_Category 0 User_Rating 0 User_Experience_Score 0 Session_Duration_sec 0 Device_Type 0 Language 0 Response_Accuracy 379 Response_Speed_sec 0 Response_Time_Category 0 Correction_Needed 0 User_Return_Frequency 0 Customer_Support_Interactions 0 Region 0 dtype: int64
df['Response_Accuracy'].fillna(df['Response_Accuracy'].median(), inplace=True)
C:\Users\ABHISHEK\AppData\Local\Temp\ipykernel_21244\505380775.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
df['Response_Accuracy'].fillna(df['Response_Accuracy'].median(), inplace=True)
def detect_outliers_iqr(df, column):
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)]
return outliers
# Columns to check for outliers
num_cols = ['Response_Accuracy', 'User_Rating', 'Session_Duration_sec', 'Response_Speed_sec']
# Detect outliers in selected numerical columns
for col in num_cols:
outliers = detect_outliers_iqr(df, col)
print(f"Outliers in {col}: {len(outliers)}")
Outliers in Response_Accuracy: 16 Outliers in User_Rating: 0 Outliers in Session_Duration_sec: 0 Outliers in Response_Speed_sec: 0
def replace_outliers(df, column, method="median"):
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
if method == "median":
replacement = df[column].median()
else:
replacement = df[column].mean()
df[column] = np.where((df[column] < lower_bound) | (df[column] > upper_bound), replacement, df[column])
# Apply to numerical columns
for col in num_cols:
replace_outliers(df, col, method="median")
print("Outliers replaced with median values!")
Outliers replaced with median values!
def remove_outliers(df, column):
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]
# Apply to selected numerical columns
for col in num_cols:
df = remove_outliers(df, col)
print("Outliers removed!")
Outliers removed!
plt.figure(figsize=(12, 6))
for i, col in enumerate(num_cols, 1):
plt.subplot(2, 2, i)
sns.boxplot(x=df[col], color='green')
plt.title(f"Boxplot After Outlier Handling: {col}")
plt.tight_layout()
plt.show()
plt.figure(figsize=(8, 5))
sns.histplot(df['User_Rating'], bins=10, kde=True, color='blue')
plt.title("Distribution of User Ratings")
plt.xlabel("User Rating")
plt.ylabel("Frequency")
plt.show()
plt.figure(figsize=(8, 5))
sns.boxplot(x=df['Response_Accuracy'], color='green')
plt.title("Boxplot of Response Accuracy")
plt.show()
plt.figure(figsize=(8, 5))
sns.barplot(x=df['AI_Platform'], y=df['Active_Users'], palette="coolwarm")
plt.title("Active Users per AI Platform")
plt.xlabel("AI Platform")
plt.ylabel("Number of Active Users")
plt.show()
C:\Users\ABHISHEK\AppData\Local\Temp\ipykernel_21244\1770559313.py:2: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(x=df['AI_Platform'], y=df['Active_Users'], palette="coolwarm")
plt.figure(figsize=(8, 5))
sns.scatterplot(x=df['Response_Accuracy'], y=df['User_Rating'], hue=df['AI_Platform'])
plt.title("User Rating vs Response Accuracy")
plt.xlabel("Response Accuracy")
plt.ylabel("User Rating")
plt.show()
plt.figure(figsize=(8, 5))
sns.boxplot(x=df['AI_Platform'], y=df['Response_Accuracy'], palette="Set2")
plt.title("Response Accuracy by AI Platform")
plt.show()
C:\Users\ABHISHEK\AppData\Local\Temp\ipykernel_21244\3714208981.py:2: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.boxplot(x=df['AI_Platform'], y=df['Response_Accuracy'], palette="Set2")
plt.figure(figsize=(10, 5))
sns.countplot(x=df['AI_Model_Version'], hue=df['AI_Platform'], palette="Set1")
plt.xticks(rotation=45)
plt.title("AI Model Version Distribution")
plt.show()
import plotly.express as px
fig = px.scatter(df, x='Response_Speed_sec', y='User_Experience_Score', color='AI_Platform',
title="Response Speed vs User Experience Score", size='User_Experience_Score')
fig.show()
fig = px.bar(df, x='Region', y='Active_Users', color='AI_Platform',
title="Active Users per Region", barmode="group")
fig.show()
from wordcloud import WordCloud, STOPWORDS
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
plt.figure(figsize=(12, 6))
for platform in df['AI_Platform'].unique():
platform_text = " ".join(df[df['AI_Platform'] == platform]['Input_Text'].dropna())
platform_wordcloud = WordCloud(width=800, height=400, background_color='black',
colormap="plasma").generate(platform_text)
plt.subplot(1, 2, list(df['AI_Platform'].unique()).index(platform) + 1)
plt.imshow(platform_wordcloud, interpolation="bilinear")
plt.axis("off")
plt.title(f"Most Searched Terms - {platform}")
plt.tight_layout()
plt.show()
def generate_wordcloud(column_name, bg_color="white", cmap="coolwarm"):
plt.figure(figsize=(10, 5))
# Combine all text from the column
text = " ".join(df[column_name].dropna().astype(str))
# Create WordCloud
wordcloud = WordCloud(width=800, height=400, background_color=bg_color,
stopwords=STOPWORDS, colormap=cmap).generate(text)
# Display WordCloud
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.title(f"WordCloud for {column_name}", fontsize=14)
plt.show()
generate_wordcloud("Query_Type")
generate_wordcloud("Topic_Category", bg_color="black", cmap="plasma")
generate_wordcloud("Language", bg_color="white", cmap="viridis")
generate_wordcloud("Device_Type", bg_color="white", cmap="inferno")
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x="Date", y="Active_Users", hue="AI_Platform", marker="o", linewidth=2)
plt.xticks(rotation=45)
plt.title("Active Users Trend Over Time (Deepseek vs ChatGPT)", fontsize=14)
plt.xlabel("Date")
plt.ylabel("Active Users")
plt.legend(title="AI Platform")
plt.grid(True)
plt.show()
fig = px.bar(df, x="AI_Model_Version", y="Active_Users", color="AI_Platform",
title="AI Model Version Usage Over Time", barmode="stack")
fig.show()
sns.jointplot(data=df, x="Response_Accuracy", y="User_Rating", hue="AI_Platform", kind="scatter", height=7)
plt.suptitle("User Rating vs Response Accuracy", fontsize=14)
plt.show()
plt.figure(figsize=(10, 5))
sns.heatmap(df.head(25).pivot_table(index="Session_Duration_sec", columns="User_Experience_Score",
values="Active_Users", aggfunc="sum").fillna(0), cmap="coolwarm", annot=True)
plt.title("User Engagement Heatmap: Session Duration vs Experience Score")
plt.xlabel("User Experience Score")
plt.ylabel("Session Duration (sec)")
plt.show()
fig = px.choropleth(df, locations="Region", locationmode="country names",
color="Active_Users", hover_name="Region",
title="Most Active Regions (Deepseek vs ChatGPT)", color_continuous_scale="viridis")
fig.show()
fig = px.scatter(df, x="Response_Speed_sec", y="User_Experience_Score",
size="User_Experience_Score", color="AI_Platform",
hover_data=["AI_Model_Version", "Response_Accuracy"],
title="Response Speed vs User Experience Score")
fig.show()
fig = px.sunburst(df, path=["AI_Platform", "Topic_Category", "Query_Type"],
values="Active_Users", color="AI_Platform",
title="AI Query Distribution Across Topics")
fig.show()
fig, ax1 = plt.subplots(figsize=(12, 6))
# First line plot for Daily Churn Rate
sns.lineplot(data=df, x="Date", y="Daily_Churn_Rate", hue="AI_Platform", marker="o", ax=ax1)
ax1.set_ylabel("Daily Churn Rate (%)", color="red")
ax1.tick_params(axis="y", labelcolor="red")
# Second line plot for Retention Rate on the same axis
ax2 = ax1.twinx()
sns.lineplot(data=df, x="Date", y="Retention_Rate", hue="AI_Platform", marker="s", linestyle="dashed", ax=ax2)
ax2.set_ylabel("Retention Rate (%)", color="blue")
ax2.tick_params(axis="y", labelcolor="blue")
plt.title("Daily Churn Rate vs Retention Rate Over Time", fontsize=14)
plt.xticks(rotation=45)
plt.show()
plt.figure(figsize=(12, 6))
sns.countplot(data=df, y="Query_Type", hue="AI_Platform", order=df["Query_Type"].value_counts().index, palette="viridis")
plt.title("Most Common Query Types by AI Platform", fontsize=14)
plt.xlabel("Count")
plt.ylabel("Query Type")
plt.show()
fig = px.scatter(df, x="Response_Accuracy", y="User_Experience_Score", size="Active_Users", color="Topic_Category",
hover_data=["AI_Platform"], title="Response Accuracy vs User Experience by Topic")
fig.show()
df_device = df["Device_Type"].value_counts().reset_index()
df_device.columns = ["Device_Type", "Count"]
plt.figure(figsize=(8, 8))
plt.pie(df_device["Count"], labels=df_device["Device_Type"], autopct="%1.1f%%", startangle=90, colors=["blue", "green", "orange", "red"])
plt.title("Device Type Distribution")
plt.show()
plt.figure(figsize=(10, 5))
sns.boxplot(data=df, x="AI_Platform", y="Customer_Support_Interactions", palette="coolwarm")
plt.title("Customer Support Interactions Across AI Platforms", fontsize=14)
plt.xlabel("AI Platform")
plt.ylabel("Support Interactions")
plt.show()
C:\Users\ABHISHEK\AppData\Local\Temp\ipykernel_21244\3627237335.py:2: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
plt.figure(figsize=(10, 6))
heatmap_data = df.head(150).pivot_table(index="Weekday", columns="Month_Num", values="Active_Users", aggfunc="sum")
sns.heatmap(heatmap_data, cmap="coolwarm", annot=True, fmt=".0f")
plt.title("Active Users Heatmap: Weekdays vs Months", fontsize=14)
plt.xlabel("Month")
plt.ylabel("Weekday")
plt.show()
plt.figure(figsize=(10, 6))
sns.violinplot(data=df, x="AI_Platform", y="User_Return_Frequency", palette="pastel", inner="quartile")
plt.title("User Return Frequency Across AI Platforms", fontsize=14)
plt.xlabel("AI Platform")
plt.ylabel("User Return Frequency")
plt.show()
C:\Users\ABHISHEK\AppData\Local\Temp\ipykernel_21244\4199882007.py:2: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
import plotly.graph_objects as go
# Ensure 'Date' is in datetime format
df["Date"] = pd.to_datetime(df["Date"])
# Check for AI Platform values
deepseek_data = df[df["AI_Platform"].str.strip().eq("Deepseek")]
chatgpt_data = df[df["AI_Platform"].str.strip().eq("ChatGPT")]
fig = go.Figure()
# Add Deepseek data if available
if not deepseek_data.empty:
fig.add_trace(go.Scatter(x=deepseek_data["Date"],
y=deepseek_data["Active_Users"],
mode="lines+markers", name="Deepseek", line=dict(color="green")))
# Add ChatGPT data if available
if not chatgpt_data.empty:
fig.add_trace(go.Scatter(x=chatgpt_data["Date"],
y=chatgpt_data["Active_Users"],
mode="lines+markers", name="ChatGPT", line=dict(color="red")))
fig.update_layout(title="Active Users Over Time (Deepseek vs ChatGPT)",
xaxis_title="Date",
yaxis_title="Active Users",
template="plotly_dark")
fig.show()
import streamlit as st
import pandas as pd
import plotly.express as px
from datetime import datetime
import os
# **Session Tracking File**
SESSION_FILE = "user_sessions.csv"
# **Load Dataset**
@st.cache_data
def load_data():
return pd.read_csv("deepseek_vs_chatgpt.csv") # Replace with actual dataset
df = load_data()
# **Function to Load User Sessions**
def load_session_data():
if os.path.exists(SESSION_FILE):
return pd.read_csv(SESSION_FILE)
else:
return pd.DataFrame(columns=["timestamp", "username", "search", "platform"])
# **Function to Save Session Data**
def save_session(username, search, platform):
session_data = load_session_data()
new_entry = pd.DataFrame([{"timestamp": datetime.now(), "username": username, "search": search, "platform": platform}])
session_data = pd.concat([session_data, new_entry], ignore_index=True)
session_data.to_csv(SESSION_FILE, index=False)
# **Sidebar Navigation**
st.sidebar.title("Navigation")
page = st.sidebar.radio("Go to", ["π Admin Dashboard", "π AI Search Dashboard"])
# **Admin Dashboard**
if page == "π Admin Dashboard":
st.title("π Admin Dashboard - User Tracking")
# **Load User Sessions**
session_data = load_session_data()
# **Total Logins**
st.subheader("π€ Total Logins")
total_logins = session_data["username"].nunique()
st.metric(label="Total Users Logged In", value=total_logins)
# **Most Active Users**
st.subheader("π₯ Most Active Users")
user_counts = session_data["username"].value_counts().reset_index()
user_counts.columns = ["User", "Login Count"]
st.dataframe(user_counts)
# **Recent Sessions**
st.subheader("π Recent User Sessions")
st.dataframe(session_data.tail(10))
# **User Search & Filtering**
st.subheader("π Search User Activity")
search_user = st.text_input("Enter username to filter activity")
if search_user:
user_activity = session_data[session_data["username"] == search_user]
st.dataframe(user_activity)
# **AI Search Dashboard**
else:
st.title("π AI Search Dashboard")
st.sidebar.title("π Filter Options")
selected_platform = st.sidebar.selectbox("Select AI Platform", df["AI_Platform"].unique())
search_query = st.sidebar.text_input("Search Query Type")
username = st.sidebar.text_input("Enter your username") # Manual user input
if username:
save_session(username, search_query, selected_platform)
filtered_df = df[df["AI_Platform"] == selected_platform]
if search_query:
filtered_df = filtered_df[filtered_df["Query_Type"].astype(str).str.contains(search_query, case=False, na=False)]
st.subheader("π Filtered Data")
st.dataframe(filtered_df)
2025-03-15 20:27:15.117 WARNING streamlit.runtime.caching.cache_data_api: No runtime found, using MemoryCacheStorageManager
2025-03-15 20:27:15.120 WARNING streamlit.runtime.caching.cache_data_api: No runtime found, using MemoryCacheStorageManager
2025-03-15 20:27:15.122 WARNING streamlit.runtime.scriptrunner_utils.script_run_context: Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.355
Warning: to view this Streamlit app on a browser, run it with the following
command:
streamlit run C:\Users\ABHISHEK\anaconda3\Lib\site-packages\ipykernel_launcher.py [ARGUMENTS]
2025-03-15 20:27:15.356 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.359 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.442 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.443 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.447 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.450 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.452 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.454 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.458 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.461 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.463 Session state does not function when running a script without `streamlit run`
2025-03-15 20:27:15.465 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.467 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.476 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.479 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.483 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.497 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.499 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.500 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.502 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.503 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.534 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.535 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.537 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.538 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.540 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.542 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.544 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.545 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.546 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.547 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.547 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.549 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.549 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.
2025-03-15 20:27:15.550 Thread 'MainThread': missing ScriptRunContext! This warning can be ignored when running in bare mode.